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probability  distributions  on  continuous  variables.  Three  types  of  biases 
have  been  identified:  too  many  true  values  falling  into  the  extreme  tails 
of  the  distributions,  a displacement  toward  501  for  distributions  assessed 
on  percentages,  and  a general  tendency  to  underestimate.  This  study  explored 
the  nature  of  these  biases  with  particular  enphasis  on  how  they  interact  and 
how  they  are  affected  by  the  procedure  used  to  elicit  the  distributions. 

Two  procedures  were  used  to  elicit  subjective  probability  distribu- 
tions on  percentage  variables.  In  a fractile  procedure,  subjects  were 
asked  to  judge  values  of  the  unknown  percentage  that  corresponded  to  fixed 
levels  of  their  cumulative  subjective  probability  distributions,  while  in 
an  odds  procedure,  subjects  judged  the  cumulative  odds  for  fixed  values  of 
the  unknown  percentages.  For  all  the  unknown  percentages,  p%,  distributions 
were  assessed  for  both  p%  and  1-pl.  The  extent  to  which  these  assessments 
summed  to  less  than  100%  indicated  a bias  toward  underestimation. 

Underestimation  was  generally  found  when  the  fractile  elicitation  was 
used,  but  not  when  the  odds  procedure  was  used.  Also,  too  many  true  values 
fell  into  the  extreme  tails  of  the  distributions  elicited  by  the  fractile 
procedure,  but  no  similar  bias  was  found  in  distributions  elicited  by  the 
odds  procedure.  The  displacement  toward  50%  was  found  in  distributions 
elicited  by  both  procedures.  This  bias  also  appeared  to  be  the  cause  of  a 
considerable  number  of  the  true  values  in  the  extreme  tails  of  the  dis- 
tributions. Many  of  the  differences  in  the  biases  found  vdien  different 
elicitation  procedures  were  used  can  probably  be  accounted  for  by  subjects 
avoiding  extreme  responses  and  odds  judgments  between  1:1  and  2:1. 
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Disclaimer 


The  views  and  conclusions  contained  in  this  document  are  those  of  the 
authors  and  should  not  be  interpreted  as  necessarily  representing  the 
official  policies,  either  expressed  or  implied,  of  the  Advanced  Research 
Projects  Agency  or  of  the  United  States  Government. 


I.  Introduction 


Opinions,  beliefs,  and  judgments  are  most  often  expressed  in  non- 
numerical  statements  such  as  "It  probably  is...",  "I  am  pretty  sure...", 

"He  is  not  likely...",  and  so  forth.  We  also  use  the  expressions  such  as 
"million  to  one  shot"  and  "fifty-fifty  chance"  which  are  semi -numerical 
expressions  not  ordinarily  used  in  the  literal  sense.  Beliefs  concerning 
the  likelihood  of  uncertain  events  are  a central  basis  for  making  decisions. 
For  good  and  consistent  decisions  to  be  made,  these  beliefs  need  an  accurate 
and  precise  representation  which  is  not  provided  by  such  statements. 

Decision  analysis  has  been  developed  in  the  last  two  decades  as  a method  to 
assist  decision  makers  in  making  decisions  imder  uncertain  conditions.  One 
of  the  tools  of  decision  analysis  is  the  use  of  subjective  probability  as 
a numerical  expression  of  uncertainty  about  relevant  variables.  This  ex- 
pression of  uncertainty  provides  a precise  representation,  but  the  accuracy 
is  open  to  discussion  and  can  be  determined  only  by  empirical  testing. 

Two  types  of  accuracy  are  involved  in  subjective  probability  state- 
ments: the  correspondence  of  the  probability  statement  to  the  true  beliefs 
of  the  assessor,  and  the  correspondence  of  the  statement  to  what  actually 
happens.  Murphy  and  Winkler  (19701  discussed  two  aspects  of  the  latter 
type  of  accuracy;  primary  and  secondary  validity.  "Primary  validity  refers 
to  the  correspondence  between  the  statement  and  the  relevant  observation  on 
an  individual  basis,  while  secondary  validity  refers  to  the  correspondence 
between  collections  of  identical  (or  similar)  statements  and  the  relevant 
observed  relative  frequencies  on  a collective  basis"  (p.  281) . Primary 
validity  is  related  to  the  accuracy  of  the  subject’s  judgment  about  the 
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occurrence  of  a particular  event.  Prediction  of  rain  tomorrow  with  proba- 
bility of  .85  is  more  accurate  than  prediction  of  rain  with  probability 


.55  if  rain  actually  occurs.  That  is,  probabilities  should  be  extreme  in  I 

favor  of  the  event  that  actually  occurs.  Secondary  validity  is  related  to  I 

the  bias  of  the  assessor's  statements.  It  typically  is  determined  by  j 

! 

comparing  the  observed  relative  frequencies  of  occurrence  with  the  stated  | 

subjective  probabilities.  For  example,  an  assessor  is  said  to  be  biased  if  i 

I 

the  relative  frequency  of  events  judged  to  have  a probability  of  .30  of  j 

occurring  differs  substantially  for  30%.  Secondary  validity,  also  referred  ! 

to  as  calibration,  realism,  and  external  validity,  has  been  the  subject  i 

i 

of  considerable  research  (for  a review,  see  Lichtenstein,  Fischhoff,  and  ! 

Riillips,  in  press) . ! 

Probabilistic  judgments  are  of  two  kinds.  Assessments  may  be  made 
for  discrete  categories,  such  as  "success  or  failure",  "rain  or  no  rain", 

"win  or  lose".  Or  assessments  may  be  made  on  continuous  variables  such  as 

I 

"the  lowest  temperature  tomorrow",  "the  price  of  oil  in  1980”,  "the  number 
of  murders  in  Los  Angeles  during  1977".  This  paper  is  concerned  with 

biases  (calibration)  in  the  latter  type  of  assessments.  ^ 

Probably  the  best  known  and  most  extensively  studied  bias  of  this 
type  is  what  has  been  described  as  the  tendency  of  subjectively  assessed  ] 

distributions  to  be  too  tight.  That  is,  a comparatively  high  percentage  j 

of  true  values  fall  into  the  tail  areas  of  the  subjective  probability 

distributions  which  are  assigned  relatively  low  probabilities.  This  bias  i 

which  seems  to  indicate  the  assessed  distributions  express  more  certainty 

f 

than  is  justified  by  the  knowledge  of  the  assessor  was  first  demonstrated  j 

2 
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by  Alpert  and  Raiffa  (1969).  The  distributions  assessed  on  a variety  of 
unknown  quantities  by  students  in  Harvard's  MBA  program  showed  that  42.6% 
of  the  true  values  fell  into  the  extreme  tails  (less  than  the  .01  fractile 
or  more  than  the  .99  fractile)  of  the  subjective  distributions  where  only 
2%  should  have  been  if  the  assessors  were  perfectly  calibrated.  This 
finding  subsequently  has  been  confirmed  by  additional  experiments  (Schaefer 
and  Borcherding,  1973;  Seaver,  von  Winterfeldt,  and  Edwards,  1975; 

Selvidge,  1975)  although  Seaver,  et  al.  found  the  existence  of  this  bias 
could  be  attributed  at  least  in  part  to  the  method  used  for  assessing  the 
subjective  probability  distributions. 

These  results  point  rather  strongly  to  a lack  of  calibration  in  the 
tails  of  assessed  subjective  probability  distributions,  but  is  this  lack 
of  calibraticxi  also  exhibited  in  the  middle  range  of  the  distributions? 

The  results  bearing  on  this  question  are  less  persuasive.  Although  Alpert 
and  Raiffa  (1969)  and  Shaefer  and  Borcherding  (1973)  found  the  interquartile 
ranges  of  the  assessed  distributions  contained  substantially  fewer  true 
values  than  the  expected  50%;  Seaver  et  al.  (1975)  and  Selvidge  (1975) 
found  both  too  few  and  too  many  true  values  in  the  interquartile  ranges 
depending  on  the  uncertain  quantity  and  the  assessment  procedure  used. 

Thus,  the  extensiveness  of  this  bias  deserves  further  investigation. 

A second  bias  that  often  appears  in  subjective  probability  distributions 
vdien  the  uncertain  quantities  are  percentages  is  a tendency  to  overestimate 
small  percentages  and  underestimate  large  percentages.  In  this  paper  we 
call  this  bias  "conservatism"  referring  to  the  tendency  to  avoid  extremes. 
Conservatism  was  originally  used  to  describe  the  phenomenon  typically 
fomd  in  probability  revision  experiments  where  after  observing  a set  of 
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data,  subjects  do  not  revise  their  posterior  probability  as  much  as  the 
normative  model,  Bayes'  Theorem  (Phillips  and  Edwards,  1966;  Wheeler  and 
Edwards,  1975).  Thus,  the  definition  of  conservatism  used  in  this  paper 
is  expanded  from  the  original  definition. 

This  bias  has  most  typically  been  studied  in  the  context  of  assessing 
discrete  categories  of  events  (Lichtenstein  et  al.,  in  press).  However, 
it  has  also  been  found  in  assessing  complete  distributions  on  continuous 
events  where  the  entire  distribution  is  displaced  toward  50%.  A typical 
method  of  showing  this  bias  is  to  determine  the  number  of  true  values 
falling  above  and  below  the  medians  of  the  subjective  probability  distri- 
butions. For  well-calibrated  assessors,  50%  of  the  true  values  should  fall 
above  the  assessed  median  and  50%  should  fall  below,  regardless  of  the  true 
value.  Conservatism  will  be  exhibited  by  more  than  50%  of  the  true  values 
falling  below  the  medians  for  true  percentages  less  than  50%  and  vice 
versa  for  true  percentages  greater  than  50%.  Selvidge  (1975)  obtained 
exactly  these  results,  suggesting  that  conservatism  exists  in  the  assessment 
of  continuous  variables  as  well  as  discrete  variables.  This  consistent 
pattern  of  results,  however,  is  not  apparent  in  the  Alpert  and  Raiffa 
study.  And  Schaefer  and  Borcherding  and  Seaver  et  al. , although  showing 
some  forms  of  median  displacement,  do  not  present  the  data  from  individual 
questions  necessary  to  examine  this  bias.  Consequently,  the  evidence 
showing  the  existence  of  the  conservatism  bias  is  also  rather  inconclusive 

y 

suggesting  the  need  for  additional  research. 

The  possible  existence  of  an  additional  bias  has  been  indicated  by 
Seaver  et  al.  They  found  a general  tendency  to  underestimate  unknown  per- 
centages with  substantially  more  true  values  falling  above  the  medians  of 
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the  assessed  distributions  than  below.  Although  the  data  from  individual 
questions  are  not  reported,  the  results  seem  striking  enough  to  imply  the 
possible  existence  of  a bias  toward  underestimation  in  addition  to  the 
conservatism  bias.  Notice,  however,  that  these  two  biases  will  conflict 
when  the  true  percentage  is  less  than  50!i,  so  some  method  of  separating  the 
influence  of  these  two  biases  is  needed.  One  possible  method  is  to  assess 
distributions  for  both  p%  and  lOO-pl.  Assuming  the  conservatism  bias  is 
equally  strong  for  percentages  above  and  below  50%,  the  sum  of  percentages 
corresponding  to  fixed  points  in  the  two  distributions  will  be  less  than 
100  if  this  bias  exists. 

The  research  reported  in  this  paper  examines  the  extent  of  each  of 
these  three  biases,  particularly  the  underestimation  bias  which  has  not 
previously  been  systematically  investigated.  In  addition,  the  study  by 
Seaver  et  al.  (1975)  suggested  that  the  procedure  used  to  elicit  the  dis- 
tributions has  an  effect  on  the  bias  leading  to  lack  of  calibration  in  the 
tails  of  the  distributions.  This  study  also  examines  how  the  elicitation 
procedure  interacts  with  the  displacement  biases  of  conservatism  and  under- 
estimation in  addition  to  this  dispersion  bias. 

One  final  possible  relationship  among  these  biases  is  also  studied. 

As  pointed  out  by  Schaefer  and  Bordherding  (1973),  biases  in  displacement 
may  lead  to  results  that  can  be  interpreted  as  a dispersion  bias.  For 
exanple,  if  the  conservatism  bias  is  present  in  subjects  assessing  unknown 
percentages  with  extreme  true  values,  the  ha.gh  percentages  will  tend  to 
be  underestimated  leading  to  a large  number  of  true  values  falling  in  the 
upper  extreme  tails  of  the  subjective  distributions,  and  the  low  percentages 
will  be  overestimated  leading  to  a large  number  of  true  values  falling  in 


the  lower  extreme  tails  of  the  subjective  distributions.  This  will  occur 
regardless  of  the  relative  tightness  of  the  assessed  distributions  and 
will  make  the  subjective  distributions  look  particularly  tight  if  statistics 
are  summed  over  unknown  percentages  with  a wide  range  of  true  values.  To 
effectively  assess  the  degree  of  these  biases,  data  from  subjective  dis- 
tributions assessed  on  single  unknown  percentages  or  on  groups  of  percentages 
with  similar  true  percentages  must  be  considered,  something  that  past  studies 
have  not  done. 

Knowledge  of  the  existence  of  these  biases  and  the  degree  to  which  the 
evidence  of  their  existence  depends  on  the  procedure  used  to  elicit  the 
subjective  probability  distributions  is  particularly  important  if  such 
probabilities  are  to  be  used  for  normative  decision  making.  ’’Good"  decisions 
miist  be  based  on  "good"  information  and  when  that  information  is  in  the 
form  of  subjective  probabilities,  the  probabilities  should  accurately 
reflect  the  opinions  of  the  assessor.  Or,  if  biases  persist,  they  should 
be  recognized  and  taken  into  account. 


II.  Method 

11. 1.  Subjects.  The  subjects  were  66  undergraduate  students  who  were 
taking  an  introductory  course  in  psychology  at  the  University  of  Southern 
California.  Subjects  participated  in  the  experiment  on  a voluntary  basis. 

11. 2.  Questionnaires.  Four  questionnaires  of  twenty  items  each  were 
developed  for  four  groups  of  subjects.  Items  in  the  questionnaires  were 
almanac  questions  of  the  type  used  in  the  experiments  by  Alpert  and  Raiffa 
(1969)  and  Seaver,  von  Winterfeldt  and  Edwards  (1975),  with  all  the 
uncertain  quantities  being  percentages.  Twenty  items  were  selected  so  that 
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the  true  percentages  would  vary  over  the  full  range  of  percentages:  each 
5%  range  contained  one  true  percentage.  Two  items  were  subsequently 
eliminated  due  to  some  ambiguity  in  wording.  A con^lete  list  of  the 
questions  and  true  percentages  can  be  found  in  the  appendix.  Each  item 
asking  about  a true  percentage,  p,  had  its  counterpart  which  asked  about 
1-p.  The  item  asking  about  p was  called  the  positive  item  while  the  item 
asking  about  1-p  was  called  the  negative  item.  TWo  questionnaires  were 
made  for  each  of  two  assessment  procedures.  Each  questionnaire  consisted 
of  ten  positive  and  ten  negative  items  with  one  true  percentage  in  each 
5%  range.  The  questions  were  randomized  for  the  questionnaires. 

II. 3.  Elicitation  procedures.  Two  methods  for  eliciting  subjective 
probability  distributions  on  percentage  variables  were  used.  Tjie  groups 
using  the  fractile  elicitation  procedure,  FRAC,  assessed  fractiles  of  the 
subjective  probability  distribution  at  five  odds  levels,  1:99,  1:3,  1:1, 
3:1,  and  99:1  for  each  question.  These  fractiles  were  elicited  in  the 
following  form,  "What  percentage  of  the  total  population  of  California 
lived  in  Los  Angeles  County  according  to  the  1970  census?  Give  the  per- 
centage such  that  your  odds  are  3:1  that  the  true  percentage  is  less  than 
that  number."  Subjects  singly  wrote  in  the  percentages  for  the  five 
required  fractiles. 

Subjects  in  the  group  using  the  second  elicitation  procedure,  0I®S, 
were  given  five  percentages  of  the  variable,  asked  whether  the  true  per- 
centage was  more  likely  to  be  above  or  below  each  given  percentage,  and 
asked  to  give  the  odds  corresponding  to  their  certainty.  The  five 
percentages  used  were  selected  separately  for  each  question  and  included 
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the  true  percentage  and  four  percentages  selected  randomly  between  1 and  99. 


III.  Results 

For  data  analyses,  all  positive  items  on  the  two  questionnaires  used 
by  the  FRAC  groups  were  grouped  into  FRAC+,  while  all  negative  items  were 
grouped  into  FRAC-.  Similarly,  items  on  the  two  ODDS  questionnaires  were 
groined  into  ODDS+  and  ODDS-. 

The  extent  of  the  underestimation  bias  as  represented  by  a tendency 
for  the  estimates  of  p and  lOO-p  to  sum  to  less  than  100,  can  be  seen  pic- 
torially  by  plotting  the  cumulative  subjective  probability  distributions 
for  both  the  positive  and  negative  items.  Figure  1 illustrates  this  with 
item  17.  For  the  FRAC  groups  (panel  a)  each  point  is  the  median  percentage 
given  for  that  fractile.  For  the  ODDS  group  (panel  b)  each  point  is  the 
median  subjective  odds  assessed  for  the  specific  percentage.  The  vertical 
line  represents  the  true  percentage.  For  the  sum  of  p and  100 -p  to  be  100, 
the  plots  of  distributions  of  positive  and  negative  items  should  coincide. 
Naturally,  as  with  all  judgmental  processes,  sane  error  is  expected.  How- 
ever, the  extent  to  which  the  error  is  always  in  the  same  direction  will 
indicate  a bias.  In  plots  of  the  type  illustrated  by  Figure  1,  a bias 
toward  underestimation  will  be  indicated  by  the  cumulative  distribution  for 
the  positive  item  always  falling  to  the  left  of  the  distribution  of  the 
negative  item.  For  exanple,  in  panel  a,  the  1:1  odds  level  was  given  a 
percentage  of  51%  for  the  positive  item  (read  from  horizontal  scale  at 
the  bottom  of  the  figure) , and  32%  for  the  negative  item  (read  from  hori- 
zontal scale  at  the  top  of  the  figure).  These  two  percentages  sum  to  only 


Figure  1;  Conparison  of  Median  Responses  Between 

Positive  and  Negative  Items  (Item  No.  17). 
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85%  indicating  underestimation.  For  this  item,  in  the  FRAC  groups,  the 
underestimation  is  consistent  across  all  odds  levels.  For  the  ODDS 
groups,  however,  there  is  a tendency  toward  overestimation  on  this  item. 

A count  of  the  number  of  items  showing  overestimation,  underestimation, 
and  no  overall  bias  (the  two  distributions  cross)  for  the  FRAC  group 
showed  0,  14,  and  4 items  respectively  in  each  category;  while  the  count 
for  the  ODDS  grovq)s  was  3,  5,  and  10  items  respectively.  Thus,  under- 
estimation was  quite  apparent  in  the  FRAC  groups  but  not  in  the  ODDS  groups. 

Figure  2,  in  which  the  medians  of  the  subjective  distributions  are 
plotted  as  a function  of  the  true  percentages,  also  illustrates  the  under- 
estimation bias  in  the  FRAC  groups  (panel  a).  In  Figure  2,  this  bias  is 
shown  if  the  median  for  the  positive  item  is  less  than  100%  minus  the 
median  for  the  negative  item,  15  and  12  items  respectively  for  the  FRAC  and 
ODDS  groups.  The  average  discrepancy  between  the  positive  and  negative 
items  is  slightly  higher  for  the  FRAC  groups  (8.01%)  than  for  the  ODDS 
groups  (4.31%),  indicating  a greater  disposition  toward  underestimation  for 
the  FRAC  groins  than  for  the  ODDS  groups. 

The  underestimation  bias  is  also  exhibited  in  the  assessment  of  fractiles 
other  than  the  median.  Figure  3 shows  the  lines  regressing  median  subjec- 
tive responses  on  true  percentage  for  all  five  fractiles.  In  all  cases 
the  responses  to  the  positive  items  are  less  than  100%  minus  the  responses 
to  the  negative  items. 

Conservatism  is  also  evident  in  this  data.  The  medians  presented  in 
Figure  2 suggest  this  tendency,  but  cannot  actually  confirm  it,  since  if 
the  individual  distributions  are  skewed,  as  would  be  expected  for  the 
extreme  percentages,  the  median  of  the  distribution  would  be  above  the 
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Figure  3:  Comparison  Regression  Lines  for  Responses  to 
Positive  and  Negative  Items  in  FRAC  Groups 


true  percentage  for  low  true  percentages  and  below  for  high  true  percentages. 

To  further  examine  this  bias,  the  data  were  categorized  according  to 
vdiere  the  true  percentage  fell  in  the  subjective  distributions.  Six 
categories  were  defined  by  the  five  fractiles  used  by  the  FRAC  groups: 


1. 

Below  the  .Olfractile 

2. 

Between  the  .01  and  .25 

fractiles 

3. 

Between  the  .25  and  .50 

fractiles 

4. 

Between  the  .50  and  .75 

fractiles 

5. 

Between  the  .75  and  .99 

fractiles 

6. 

Above  the  .99  fractile 

Data  from  well -calibrated  subjects  should  have  approximately  1%,  24%,  25%, 
25%,  24%,  and  1%  of  the  true  values  falling  into  the  six  categories 
respectively  when  grouped  in  this  manner. 

Conservatism  is  shown  by  a large  number  of  tiue  percentages  falling 
into  categories  1 through  3 when  the  true  percentage  is  low  and  a large 
nuirber  of  true  percentages  falling  into  categories  4 through  6 when  the 
true  percentage  is  high.  The  percentages  of  responses  in  each  category 
are  broken  down  by  elicitation  procedure  and  true  percentage  in  Table  1. 

Considerable  conservatism  is  apparent.  With  true  percentages  less 
than  50%,  the  percentage  of  true  percentages  falling  into  categories  1 
through  3 is  68%,  70%,  62%,  and  72%  for  the  FRAC+,  FRAC-,  0DDS+,  and  0Iff)S- 
groups  respectively;  well  over  the  50%  expected  from  well-calibrated  siib- 
jects.  Similarly,  for  true  percentages  greater  than  50%;  73%,  77%,  67%, 
and  73%  of  the  true  percentages  fall  into  categories  4 through  6 for  the 
four  groi5)s  respectively. 

Table  1 also  illustrates  the  two  measures  usually  used  to  indicate 
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TABLE  1 


Percentage  of  Responses  in  Each  Category 

FRAC+  ODDS+ 

Item  True  Category  Category 


No 

% 

1 

2 

3 

4 

5 

6 

1 

2 

3 

4 

5 

6 

5 

4 

.61 

.32 

.07 

.03 

.62 

.26 

.09 

4 

7 

. .29 

.17 

.35 

.12 

.06 

.16 

.72 

.13 

20 

14 

.33 

.56 

.11 

.06 

.71 

.18 

.06 

16 

18 

.28 

44 

.19 

.03 

.06 

.03 

.56 

.31 

.03 

.06 

6 

24 

.23 

.23 

.15 

.38 

.09 

.03 

.29 

.53 

.06 

13 

29 

.11 

.28 

.28 

.28 

.06 

.25 

.13 

.38 

.25 

3 

35 

.08 

.38 

.38 

.15 

.03 

.56 

.35 

.06 

2 

44 

.15 

.23 

.46 

.15 

.09 

.12 

.24 

.50 

.06 

1 

46 

.17 

.22 

.28 

.33 

.07 

.07 

.47 

.33 

.07 

19 

54 

.08 

.03 

.22 

.28 

.17 

.22 

.31 

.19 

.13 

.28 

14 

59 

.36 

.14 

.36 

.14 

.06 

.59 

.21 

.03 

.12 

18 

62 

.06 

.06 

.44 

.19 

.13 

.13 

.38 

.28 

.09 

.25 

8 

66 

.08 

.15 

.62 

.15 

.15 

.15 

.18 

.53 

10 

71 

.11 

.17 

.33 

.11 

.28 

.25 

.06 

.25 

.38 

.06 

11 

78 

.07 

.14 

.07 

.43 

.29 

.12 

.09 

.29 

.50 

12 

87 

.21 

.21 

.57 

.12 

.79 

.09 

17 

95 

.25 

.75 

.06 

.16 

.78 

15 

98 

.07 

.07 

.21 

.64 

.06 

.06 

.38 

.50 

Total 

.128 

.164 

.187 

.166 

.155 

.201 

.024 

.303 

.148 

.175 

.332 

.019 

FRAC-  ODDS- 


Item 

No 

True 

% 

1 

2 

Category 

3 4 

5 

6 

1 

2 

Category 

3 4 

5 

6 

5 

96 

.06 

.17 

.78 

.16 

.84 

4 

93 

.07 

.36 

.57 

.06 

.06 

.79 

.09 

20 

86 

.07 

.43 

.50 

.13 

.03 

.09 

.69 

.06 

16 

82 

.03 

.23 

.27 

.47 

.06 

.06 

.12 

.76 

6 

76 

.50 

.22 

.17 

.11 

.06 

.28 

.09 

.06 

.50 

13 

71 

.17 

.17 

.42 

.25 

.03 

.35 

.21 

.06 

.35 

3 

65 

.09 

.12 

.21 

.18 

.24 

.18 

.06 

.31 

.63 

2 

56 

.05 

.44 

.22 

.05 

.22 

.11 

.14 

.14 

.57 

.04 

1 

54 

.14 

.25 

.25 

.14 

.21 

.59 

.21 

.15 

.06 

19 

46 

.13 

.13 

.40 

.13 

.13 

.07 

.06 

.41 

.15 

.21 

.18 

14 

41 

.11 

.28 

.06 

.56 

.03 

.18 

.13 

.19 

.47 

.06 

18 

38 

.31 

.08 

.15 

.15 

.08 

.23 

.25 

.06 

.38 

.28 

.03 

8 

34 

.17 

.39 

.28 

.06 

.11 

.03 

.66 

.19 

.03 

.09 

10 

29 

.07 

.50 

.14 

.21 

.07 

.26 

.15 

.15 

.44 

11 

22 

.12 

.38 

.26 

.16 

.38 

.06 

.09 

.31 

12 

13 

.28 

.55 

.17 

.24 

.20 

.80 

17 

5 

.79 

.11 

.07 

.04 

.06 

.68 

.24 

.03 

15 

2 

.47 

.35 

.06 

.12 

.03 

.47 

.41 

.09 

Total 

.129 

.141 

.208 

.143 

.141 

.204 

.050 

.301 

.139 

.123 

.373 

.014 
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the  relative  tightness  of  assessed  distributions.  The  percentage  of 

true  values  falling  into  the  extreme  tails  of  the  subjective  distributions 

(categories  1 and  6),  called  the  surprise  score,  SS,  should  be  approxi-  j 

mately  2%.  For  both  the  FRAC+  and  FRAC-  groups,  the  SS  is  well  above  this  1 

figure  (approximately  33%  for  both  groups).  For  the  ODDS  groups,  however, 

I 

the  SS's  are  only  slightly  higher  than  2%;  approximately  4%  and  6%  for  } 

i 

the  ODDS+  and  ODDS-  groups  respectively.  j 

j, 

The  second  measure  of  tightness,  the  interquartile  score,  IS,  (cate-  i 

i 

gories  3 and  4)  should  be  approximately  50%  for  well-calibrated  subjects.  | 

Interquartile  scores  less  than  50%  indicate  too  tight  distributions  idiile  | 

scores  greater  than  50%  indicate  too  loose  distributions.  Table  1 shows  I 

■! 

the  IS's  of  35%,  35%,  32%,  and  26%  for  the  FRAC+,  FRAC-,  0DDS+,  and  ODDS-  jj 

groups  respectively  were  all  well  below  50%. 

These  measures  are,  however,  based  on  data  accumulated  over  all  values 
of  true  percentages.  The  effect  of  the  true  percentage  on  these  measures 
is  evident  in  Table  1 and  suranarized  in  Table  2 where  the  SS's  and  the  IS's 
are  given  for  middle  range  true  values  (40-60%)  and  for  extreme  true 
values  (1%-10%  and  90%-99%).  In  the  FRAC  groups,  the  distributions 
assessed  on  the  extreme  percentages  are  much  too  tight  while  distributions 
assessed  on  the  middle  range  of  true  percentages  are  too  tight  as  measured 
by  the  SS,  but  not  as  measured  by  the  IS.  The  ODDS  groins  show  only  a 
slight  difference  in  the  IS's  with  the  distributions  assessed  on  the 
middle  range  of  true  percentages  being  nearer  to  the  expected  50%. 

The  actual  spread  of  the  distributions  elicited  by  the  two  procedures 
also  can  be  conpared  by  examining  the  slopes  of  median  distributions  of 
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Table  2 


Surprise  Scores  and  Interquartile  Scores 
for  Middle  and  Extreme  True  Percentages 


each  item  as  presented  in  Figure  1.  TWo  such  conparisons  were  made:  the 
interquartile  range  (IQR)  and  the  maximum  possible  range  (MPR) . In  the 
FRAC  groups,  the  slope  of  the  MPR  was  the  slope  between  points  at  the 
1:99  and  the  99:1  odds  levels.  In  the  ODDS  groups,  the  slope  of  the  MPR  was 
taken  between  points  for  the  odds  assessed  for  the  lowest  of  the  five 
percentages  given  for  each  item  and  the  odds  assessed  for  the  highest 
percentage  given.  Interpolation  was  used  to  find  the  IQR  in  the  ODDS 


groiqjs.  The  results  of  the.se  con^risons  are  given  in  Table  3,  panel  a. 
The  symbol  (=)  indicates  that  the  difference  between  the  average  slopes 
was  less  than  .10,  while  the  ineqiiality  synbols  show  a difference  greater 
than  . 10  in  the  indicated  direction.  Although  . 10  was  an  arbitrary 


criterion,  the  conroarisons  between  the  FRAC  and  ODDS  groups  suggest  that 


TABLE  3 


Con^arison  of  Average  Slopes  for  Each  Item 


i 

I 

! ’ Between  Group  Comparison 


FRAC  Groups  vs.  ODDS  Groins 


IQR 

MPR 

Item 

Pos.  Neg. 

Pos.  Neg. 

1 

< > 

< < 

2 

> > 

< < 

3 

< > 

< < 

4 

= > 

< > 

5 

=i  > 

< < 

6 

> < 

< < 

8 

> > 

= = 

10 

> = 

< < 

11 

a s 

12 

> > 

< < 

13 

> =5 

< < 

14 

> =: 

< < 

15 

= < 

< < 

16 

> 

< < 

17 

> < 

=;  < 

18 

> =S 

s;  s 

19 

> > 

< < 

20 

> 

< < 

Within  Group  Comparison 


IQR  vs.  MPR 


Item 

1^911 

Pos.  Neg. 

Pos . Neg . 

1 

< < 

2 

< < 

3 

= = 

< < 

4 

2 > 

< < 

5 

= = 

< < 

6 

> 

< < 

8 

= =: 

< < 

10 

2:  =: 

< < 

11 

S CJ 

= < 

12 

s cr 

< < 

13 

> = 

< < 

14 

s s 

< < 

15 

s s 

< < 

16 

C£  Si 

< < 

17 

S Si 

< < 

18 

a < 

< < 

19 

St  s 

< < 

20 

< < 

frequency 

> 11 

10 

0 

0 

frequency 

< 2 

1 

0 

0 

= 6 

5 

4 

3 

= 16 

16 

1 

0 

> 1 

3 

14 

15 

> 0 

1 

17 

18 

Note: 

> : Average  slope  in  ODDS  group  is 
greater  than  FRAC  group 

= : Difference  is  less  than  .10 

< : Average  slope  in  FRAC  group  is 
greater  than  ODDS  group 

(a) 


> : Average  slope  within  MPR  is 
greater  than  IQR 

= : Difference  is  less  than  .10 

< : Average  slope  within  IQR  range 
is  greater  than  MPR. 

(b) 
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I the  subjective  probability  distributions  of  the  ODDS  groups  are  tighter 

; within  the  IQR  and  looser  in  the  MPR,  a finding  consistent  with  the  SS  and 

; IS  data. 

A qualitative  idea  about  the  shape  of  the  subjective  probability 
distributions  can  be  determined  by  conqjaring  the  MPR  slope  with  the  IQR 
slope  for  each  item.  An  IQR  slope  greater  than  the  MPR  slope  shows  that 
I the  subjective  distribution  has  a higher  density  between  the  quartiles 

i than  in  the  rest  of  the  distribution,  while  an  MPR  slope  greater  than  the 

I IQR  slope  shows  lower  density  between  the  quartiles  of  the  subjective 

I distribution.  Approximately  equal  slopes  suggest  a near  uniform  distribution. 

I 

S The  results  of  these  con^)arisons , shown  in  Table  3,  panel  b,  show  that  for 

f . 

1 the  most  part,  the  distributions  assessed  by  the  FRAC  groups  are  near 

uniform,  while  the  distributions  of  the  ODDS  groups  have  higher  density 
in  the  middle  of  the  distribution,  a more  typical  shape  for  probability 
distributions. 


IV.  Discussion 


All  three  of  the  biases  investigated  were  found  to  some  degree  in  this 
study.  In  most  cases,  the  extent  of  the  bias  depended  on  the  value  of 
the  true  percentage  and/or  the  procedure  used  to  elicit  the  subjective 
probability  distributions. 

The  imderestimation  bias  seemed  to  be  much  stronger  vrtien  the  fractile 


procedure  was  used  to  elicit  the  probability  distributions  than  when  the 
odds  procedure  was  used.  The  extent  of  this  bias  has  not  been  previously 
determined,  althou^  Seaver  et  al.  (1975)  suggested  its  possible  existence. 
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Obviously,  further  study  is  warranted.  However,  should  the  underestimation 
bias  prove  to  be  rather  common,  a possible  method  for  alleviating  it  comes 
to  mind.  Perhaps  effects  of  the  bias  can  be  reduced  or  eliminated  by 
obtaining  both  positive  and  negative  assessments  and  ccmibining  them.  The 
results  of  this  study  suggest  that  this  may  be  a feasible  approach  to 
cancelling  the  effects  of  this  bias. 

The  surprise  scores  and  interquartile  scores  that  are  traditionally 
used  to  measure  tightness  seem  to  indicate  that  the  distributions  assessed 
in  this  study  were  too  tight.  But  other  factors  suggest  such  a sin?)le 
interpretation  may  be  misleading.  Can  a distribution  that  covers  a wide 
range  of  the  possible  percentages  and  is  nearly  uniform  really  be  called 
tight?  The  answer  to  this  question  depends  on  the  meaning  given  to  the 
concept  of  "tightness".  Tlie  tightness  measured  by  surprise  scores  and 
interquartile  scores  is  a relative  tightness  that  cotipares  assess  distri- 
butions with  actual  occurrences.  This  type  of  tightness  has  received 
widespread  attention  in  research  not  only  because  of  the  importance  of 
knowing  the  correspondence  between  the  subjective  distributions  and  reality, 
but  also  because  measures  of  this  correspondence  are  easily  available. 
Difficulties  do  exist  with  this  concept  of  tightness.  It  only  applies  to 
collections  of  distributions,  never  to  a single  distribution.  In  this 
study,  as  in  past  studies,  the  tightness  measures  were  determined  not  only 
across  the  distributions  of  a single  assessor,  but  also  across  assessors. 

The  concept  of  tightness  implied  by  examining  the  range  and  shape  of 
single  distributions  is  more  absolute.  Flat  distributions  covering  a wide 

■ 

i range  of  values  woxild  not  normally  be  considered  too  tight.  But  questions 

I 

i siKh  as  "What  is  flat?"  and  "What  is  a wide  range?"  make  this  a difficult 
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concept  with  which  to  deal.  Therefore,  although  in  an  absolute  sense  the 
distributions  assessed  in  this  experiment  may  not  be  "too  tigjit",  the 
following  discussion  adopts  the  more  traditional  meaning  of  tightness  for 
ease  of  discussion  and  comparison  with  previous  findings. 

The  results  of  this  experiment  conceming  both  the  tightness  of  dis- 
tributions as  measured  by  surprise  and  interquartile  scores  and  the  con- 
servatism bias  are  generally  in  agreement  with  past  results.  Distributions 
elicited  by  both  fractile  and  odds  procedures  exhibited  conservatism. 

Tightness,  as  measured  by  surprise  scores  summed  across  all  true  values, 
was  apparent  in  the  FRAC  groups  but  not  in  the  ODDS  groves,  a finding  con- 
sistent with  the  results  of  Seaver  et  al.  The  interquartile  scores,  again 
summed  over  all  true  percentages,  were  much  lower  than  50%  for  both  groups 
also  seemingly  indicating  the  distributions  were  too  tight.  This  result 
contrasts  with  the  results  of  Seaver  et  al.  who  found  interquartile  scores 
near  50%  for  both  elicitation  procedures.  No  satisfactory  explanation  of 
this  discrepancy  seems  to  exist.  However,  the  difference  in  interquartile 
scores  should  not  necessarily  be  surprising,  since  the  interquartile 
scores  in  this  study  are  in  the  same  general  range  as  those  obtained  by 
Alpert  and  Raiffa  (1969)  and  Schaefer  and  Borcherding  (1973) . 

An  important  way  in  which  this  study  differs  from  most  past  studies 
of  the  biases  in  subjective  probability  distributions  is  that  the  method 
used  to  document  the  biases  allows  some  determination  of  the  degree  to 
vhich  the  conservatism  bias  may  be  partially  responsible  for  surprise  scores 
and  interquartile  scores  that  have  traditionally  been  used  to  show  the 
tightness  of  distributions.  For  example,  when  the  true  percentage  is 
extremely  low,  conservatism  will  tend  to  displace  the  assessed  distribution 

i 
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toward  the  right;  so  sinply  because  of  conservatism,  the  surprise  score  may 
be  high. 

The  extent  to  which  conservatism  influences  the  measures  traditionally 
used  to  assess  the  tightness  of  distributions  depends  both  on  the  measure 
used  (surprise  score  or  interquartile  score)  and  on  the  procedure  used  to 
assess  the  distributions.  Although  direct  measurement  of  the  relative 
contribution  of  conservatism  to  the  surprise  scores  and  interquartile 
scores  was  not  possible,  some  inferences  can  be  drawn.  Conservatism  could 
not  account  for  the  high  surprise  scores  in  the  FRAC  groups  and  the  low 
interquartile  scores  in  the  ODDS  groups  when  the  true  percentages  were  in 
the  middle  range  (40%-60%).  The  similarity  in  surprise  scores  and  inter- 
quartile scores  between  middle  range  and  extreme  true  percentages  for  the 
ODDS  groiq)S  suggests  sanething  other  than  conservatism  produces  these 
scores  even  with  extreme  true  percentages  vdien  the  odds  elicitation  pro- 
cedure is  used.  The  much  higjier  surprise  scores  and  lower  interquartile 
scores  for  extrane  true  percentages  than  for  true  percentages  in  the  middle 
range  indicated  considerable  conservatism  in  the  FRAC  groups. 

Because  the  procedures  used  to  elicit  the  subjective  probability 
distributions  seem  to  have  an  effect  on  the  extent  of  these  biases,  it 
becomes  in^xirtant  to  know  which  elicitation  procedures  may  reduce  or  elimi- 
nate biases.  The  results  of  this  study  suggest  that  with  regard  to  most 
of  the  biases,  the  odds  procedure  is  better  than  the  fractile  procedure. 

The  odds  procedure  does  not  produce  the  underestimation  that  the  fractile 
method  does.  The  surprise  scores  and  interquartile  scores  are  less 
dependent  on  the  true  percentage  when  the  odds  procedure  rather  than  the 
fractile  procedure  is  used.  The  odds  procedure  produces  many  fewer 


surprises  than  the  fractile  method.  Only  on  the  interquartile  score  does 
the  fractile  elicitation  procedure  seem  to  lead  to  better  calibrated  dis- 
tributions than  the  odds  procedure. 

What  causes  the  differences  in  the  distributions  elicited  by  the  two 
methods?  A rather  simple  phenomena  may  explain  the  difference  in  surprise 
scores.  A tendency  to  avoid  extreme  responses  would  lead  to  both  the  large 
number  of  surprises  in  the  FRAC  groups  and  the  small  nuni)er  of  surprises 
in  the  ODDS  groins.  Since  the  responses  of  the  FRAC  groups  are  percentages, 
avoiding  extreme  percentages  could  lead  to  the  .01  and  .99  fractiles  being 
too  close  together.  However,  in  the  odds  elicitation  procedure  the 
respcaises  are  odds.  A surprise  can  only  occur  if  a subject  eissigns  odds  of 
at  least  99 ;1  that  the  true  percentage  is  greater  or  less  than  the  given 
percentage  vrfien  the  given  percentage  is  the  true  percentage.  If  the 
subjects  avoid  extreme  responses,  very  few  responses  will  be  as  large  as 
99:1,  so  there  is  little  chance  of  a surprise  occurring. 

Consideration  of  the  odds  that  must  be  assigned  to  the  true  percentage 
in  the  odds  elicitation  procedure  for  the  true  percentage  to  fall  within 
the  interquartile  range  also  may  explain  why  the  interquartile  scores  were 
so  low.  Only  if  the  odds  assigned  are  3:1  or  less  will  the  true  percentage 
fall  within  the  interquartile  range.  The  high  number  of  true  percentages 
falling  into  categories  2 and  4 (see  Table  1)  may  simply  reflect  the  fact 
that  subjects  are  more  likely  to  make  responses  between  3:1  and  99:1. 
Perhaps  suitable  training  in  the  use  of  smaller  odds,  particularly  frac- 
tional values  between  1:1  and  2:1,  would  help  eliminate  this  bias.  Use 
of  probabilities  rather  than  odds  as  the  measures  of  uncertainty  might 


also  help  with  this  problem  in  the  interquartile  range.  Seaver  et  al. 
found  the  interquartile  scores  were  higher  when  probabilities  were  used 
rather  than  odds,  but  in  that  case  the  interquartile  scores  were  too  high. 

Since  this  experiment  was  conducted  primarily  to  explore  several 
possible  biases  and  their  relationship  to  the  procedures  used  to  elicit 
the  distributions  and  the  true  values  of  the  unknown  percentages,  most  of 
the  findings  are  suggestive  rather  than  cmclusive.  The  results  do  strongly 
suggest  the  existence  of  a previously  unconfirmed  bias,  underestimation. 

They  are  also  generally  consistent  with  previous  results  with  respect  to 
conservatism  and  the  tightness  of  distributions,  but  suggest  possible 
interactions  between  the  measurements  of  these  biases.  Following  Seaver 
et  al.  (1975),  there  are  also  indications  that  the  procedure  used  to  elicit 
the  subjective  probability  distributions  also  influences  the  extent  of  the 
biases.  Generally,  the  ODDS  procedure  leads  to  less  biased  distributions 
than  the  FRAC  procedure.  All  of  these  suggestive  findings  deserve  further 
exploration  in  atten^jts  to  discover  the  processes  by  which  people  assign  | 

subjective  probability  distributions  to  unknown  variables.  y 
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VI . Appendix 
Questions  (Positive  Items) 

1.  What  percentage  of  the  total  world  water  area  is  contained  in  the 
Pacific  Ocean?  46% 

2.  In  1971,  what  percentage  of  the  total  U.S.  electrical  energy  was 
produced  by  coal?  44% 

3.  What  percentage  of  all  U.S.  military  personnel  were  in  the  Amy  in  1973? 
35% 

4.  What  percentage  of  all  U.S.  coastline  is  in  California?  7% 

5.  What  percentage  of  the  total  world  gold  production  was  produced  in  the 

U.S.  in  1970?  4% 

6.  As  of  April,  1973,  what  percentage  of  all  U.S.  Federal  enqiloyees  were 
en^jloyed  by  the  U.S.  Postal  Service?  24% 

8.  What  percentage  of  all  natural  gas  marketed  in  the  world  was  produced 
in  North  and  Central  America  during  1971?  66% 

10.  What  percentage  of  all  members  of  the  Muslim  religicai  lived  in  Asia  in 

1972?  71% 

11.  During  the  period  fran  1870  to  1971,  vrtiat  percentage  of  immigrants  to 
the  U.S.  came  from  Europe?  78% 

12.  What  percentage  of  registered  voters  voted  in  the  1972  U.S.  general 
(Presidential)  election?  87% 

13.  What  percentage  of  the  total  U.S.  advertising  expenditures  went  to 

newspapers  in  1970?  29% 

14.  During  the  period  fran  1950  to  1972,  v4iat  percentage  of  all  people 
examined  for  entry  into  the  U.S.  armed  forces  were  found  acceptable? 

59% 

15.  As  of  1973,  what  percentage  of  U.S.  households  had  television?  98% 

16.  What  percentage  of  the  population  of  the  city  of  Los  Angeles  was  black 

in  1970?  18% 


17.  During  1971,  what  percentage  of  all  television  air  time  in  Prance  was 
occv5)ied  by  programs  produced  in  France?  95% 
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18.  What  percentage  of  U.S.  Presidents  have  been  lawyers?  62% 

19.  In  1972,  what  percentage  of  all  automobiles  produced  in  the  U.S.  were 
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